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immediate post-program criterion test; however, overt responses were found more 
effective on a delayed retention measure. When an overt response was required, it 
was more effective given as a sentence in context than as a single word. Considering 
the frequency and scheduling of reinforcers, increases in reinforcement decreased 
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inter Tering effect on moderate or easy programs and that increases in the number 
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Abstract 

The purpose of this project was to investigate some alternative 
methods of writing programed materials to enhance learning. The subject 
matter of the programed material was concerned with educational measure- 
ment. A series of experiments were conducted to test alternative ways 
of writing, arranging and responding to these materials. This abstract 
presents a brief summary of each experiment and refers to the chapter 
in which further details can be obtained. The abstract concludes with 
some guidelines for program writers based on extrapolations from these 
research findings. At the end of the report appendices are included 
to illustrate the nature of the materials used in one of the experiments. 

Chapter X: To test the effect of overt vs. covert responding in 

programed instruction, 54 undergraduates in educational psychology were 
randomly assigned to four groups: a group who wrote down each response, 

a group who "mentally composed" each response, a group who read the 
program in which the blanks were already filled, and a control group 
who wrote their answers to a completely different program of about the 
same length. A 50-item test was administered following the study period, 
and an alternate form two weeks later. The three response mode groups 
did not differ, significantly on the first test. However, on the delayed 
test the written response group scored significantly higher than the 
other two groups. The control group scored significantly lower on both 
tests. Thus, overt responding appears to increase delayed retention. 

Chapter II: The effect of intermittent confirmation was tested on 

121 students by omitting various patterns of confirming answers from a 
programed textbook on educational measurement. The schedules included 
four levels of fixed-ratio confirmation and two of variable-ratio 



confirmation. Results based on criterion measures consisting of (a) 
errors made on the program and (b) performance on a post test indicated, 
a negative linear relationship between the number of errors made on the 
program and the percentage of confirmation provided, no significant 
effects on the posttest from the various proportions of confirmation, 
and no evidence of differential effect between fixed-ratio and variable - 
ratio confirmation on either criterion. 

Chapter III: An experiment was conducted comparing two approaches 

of presenting the confirming response in a programed textbook designed 
to teach prospective teachers how to write valid classroom achievement 
tests. The "isolation" approach consisted of presenting the desired 
response to the stimulus frame as a single word or phrase in the tradi- 
tional programed manner. The "context" approach consisted of presenting 
the confirming response asa complete thought usually by inserting the 
desired response in a repetition of the relevant part of the stimulus 
frame. A control group received a completely different program. Thirty- 
two subjects were randomly assigned to the three treatment groups. 

Although the "context" groups did not exceed the "isolation" group 
in knowledge of terminology on the criterion text, they did excel sig- 
nificantly on ability to apply principles of test construction. If 
confirmed by further research, this later finding would support the 
theoretical position that each contiguous pairing of stimulus and 
response strengthens the association betwee 1 them and would suggest a 
modification in the manner of presenting the confirming response in 
programed material. 
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Chapter IV: Fifty-three undergraduates and 67 graduate students 

were randomly assigned to four groups: a standard kev-word program 

group that responded with an important concept to each frame, a trivial - 
word program group that responded with a minor word, a paragraph* formant 
group that read the identical material written as textbook prose, and 
a control group that studied and responded to a program covering dif- 
ferent topics. Parallel forms of a criterion test were administered 
immediately after a two-day study period and again two weeks later. 

In general, both the standard kev-word program group and parag.rAP.h- 
f or mat group scored about the same, and both were higher than the 
trivial -word program group. The results cast doubt on the importance 
of requiring a response but emphasize the critical nature of any responses 
that are required. 

Chapter V: Six degrees of reinforcement in a 177-frame program 

were provided by modifying both the number of questions asked and the 
number of confirming answers. Findings: (a) The more reinforcement, 

the fewer errors made on an immediate criterion test and on the program 
itself, (b) The more reinforcement, the more Ss perceived the program 
as interesting and a valuable learning experience, (c) When the same 
criterion test; was delayed two months for .Ss not receiving the immedi- 
ate test, there was no statistically significant difference in criterion 
test scores among S^s subjected to various reinforcement conditions. 

Chapter VI: One experiment investigated the effect of typographical 

cues on hard, medium, and easy programs. The medium form consisted of 
a simple straightforward exposition of the main points, with examples. 
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The easy form was identical except that additional content hints and 
cues were provided in order to make it more likely that the student 
would answer correctly. The hard form was made more difficult by com- 
plicating the sentence structure, by adding steps necessary to solve 
the problems, by including irrelevant information, and by using less 
familiar terminology. For those programs which were written without 
typographical cues, the hard, medium, and easy programs had difficulty 
levels of 22%, 7%, and 4%, respectively. For the cued programs, the 
error rates for the hard, medium, and easy programs were 16%, 6%, and 
37 0 , respectively. Typographical cues reduced error rate on the pro- 
gram but improved immediate criterion-test performance with a diffcult 
program. They interfered with criterion- test performance on a medium 
difficulty-level program. 

The second study attempted to determine the effects of two specific 
variations in the program: adding irrelevant information and compli- 

cating the sentence structure. On the average, the control programs 
had an error rate of about 8%; irrelevant programs had an error rate 
of about 18%; and the complex programs had an error rate of about 24%. 
This study failed to reveal significant differences in learning in 
spite of the fact that reliable differences in difficulty level had 
been produced either by adding irrelevant imformation or by adding 

complex wording and complex problems. 

In the third study multiple-choice programs were constructed with 
variations along three dimensions: (1) the plausibility or implausi 

bility of the alternatives; (2) the number of alternatives provided, 
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either two or four; and (3) the form of the answer, either recording the 
letter of the correct alternative or writing out the correct alternative 
in its entirety. While none of the differences in this study proved to 
be very large, it was interesting to note a consistent thread throughout 
all of them. The condition that produced the higher error rate also 
tended to produce better performance on the criterion tests. Thus, for 
example, providing more plausible alternatives on the program caused 
students to make more errors on the program but tended to improve their 
performance on the delayed constructed-response criterion test. Requir- 
ing them to discriminate among four alternatives rather than two pro 
duced a higher error rate but increased their performance on the immedi- 
ate multiple-choice criterion test. Asking students to write out a 
complete answer rather than merely writing the letter of the correct 
answer caused students to perform better on the multiple-choice immedi- 
ate criterion test even though it did not affect their error rate on 

the program. 

Chapter VII: A program can be judged as adequate when all members 

of the target population demonstrate complete mastery of all the desired 
behaviors at the time desired by the program writer. To begin work to- 
ward this standard, we must first (a) state the terminal behaviors we 
desire, (b) state when they will be needed, and (c) have a procedure 
for assessing the degree to which each behavior has been mastered. 

A perfectly adequate program will seldom be produced, so perfection 
must inevitably be compromised. But the way in which it is to be com- 
promised is of critical importance. One may wish to reduce the percent 
of the target population that one hopes to reach; one may reduce the 
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The main findings were as follows: 

1. The inductive group made more errors on the program, took less 
time to answer test questions on rules, but liked their method of instruc- 
tion less than the deductive group liked theirs; 

2. Neither method of teaching nor frequency of alternation pro- 
duced significant differences in scores on the criterion test; 

3. Interaction effects with intelligence appeared only when the 
criteria were amount of time taken to answer the test questions and 

number of errors made on the program; 

4. Correlations between number of program errors and test scores 

were low and negative. 

Guidelines for Program Writers 

Although generalizations are dangerous, an attempt will be made 
here to extrapolate from the results of these experiments and formulate 
some advice to program writers. It must be remembered that all findings 
were based on one type of subject matter (educational measurement) in 
one style of programing (linear) with a limited range of subjects (high 
school and college students) • 

If the conclusions of these experiments are confirmed by other 
research, the following advice may be warranted: 

1. A program may not be needed. Sophisticated subjects learning 
only moderately difficult material may be able to learn just as well 
and more quickly from conventional text material. 

2. If you want subjects to retain what they learn, insist that 

they write their responses to the programed material. 
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3. Provide reinforcing answers for every frame. Although no dif- 
ferences between partial and continuous reinforcement occur on criteri- 
on tests, students make fewer errors on the program itself and rate the 
program as more interesting and valuable under continuous reinforcement. 

4. Put reinforcing answers in context. Do not merely present the 
answer as one word of number in isolation, but include it in a sentence. 



5. Require that the student supply a critical, not a trivial, 
answer. If the student can get the correct answer without being required 
to engage in the desired thought processes, he will not attain the edu- 
cational goal as well. 

6. In multiple-choice type programed materials include more than 
two alternatives and make the incorrect alternatives sound plausible. 

7. Use typographical cues sparingly and only in the most difficult 
programs. In moderately difficult programs they may actually inhibit 



learning. 

8. Don’t assume that the error rate on the program itself indicates 



how well students will learn from it. Adding irrelevant information 



and requiring more complex reading and problem-solving increase error 
rate but not criterion performance. Additional cues and hints decrease 

error rate but not criterion performance. 

9. If you want students to express liking for the program, 
sequence frames so that generalizations are presented prior to examples 
However, amount of learning on ctiterion tests appears unaffected by 
the variations in sequence tried in the experiments. 

10. Consider a program adequate when all members of the target 
population are able to emit all the desired behaviors for an appropri- 



ate length of time after instruction. 
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Factors Affecting Difficulty Level and 



Criterion Performance 



John D. Krumboltz and David Rawnsley 



One persistent controversy in the field of programed learning has 
concerned the optimum difficulty level of the material to be learned. 
Skinner has consistently advocated error-free learning, while others, 
Pressey among the most articulate, have pointed out not only that it 
is possible to learn by making errors but that it is often more effi- 
cient too. Since the controversy concerned a question where empirical 
evidence would be relevant, an experiment seemed a logical step to 
settle the controversy. Why not do the crucial experiment that would 

o 

settle once and for all the effect of difficulty level on subsequent 



learning and retention? 

The problem seemed simple at first. All one would have to do would 
be to construct programs of different difficulty levels, randomly assign 
subjects to the different programs, and observe their criterion per- 
formance. The simplicity of this study disappears immediately. How 
do you write programs of different difficulty levels? The difficulty 
level of programed materials is a function of many factors and cannot 



be manipulated directly. 

Difficulty level is a dependent variable, not an independent 
variable, and may vary directly or inversely with criterion performance 
depending on a number of independent variables which influence both 

difficulty level and criterion performance. 

All experiments were conducted with meaningfiil learning material 
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concerned with educational measurement. In general, programs were 
written to teach teachers how to interpret educational and psychological 
test scores. Topics covered included means and grade equivalent scores, 
medians and percentile scores, the standard deviation, the normal curve, 
validity and correlation. The general plan of all the stuoies involved 
constructing materials corresponding to the main independent variables, 
randomly assigning subjects to the one, two or three dimensional fac- 
torial designs, and measuring performance once or twice thereafter. 

The first experiment in this series was similar to one that 
James Holland reported to the American Psychological Association three 
years ago. Two forms of a linear constructed response program were 
prepared. One form, called the Key-word form, asked students to fill 
in some critical word or number which indicated that he had understood 
the main point of the frame. The second form, called the Trivial-word 
form, asked the student to fill in some minor preposition or other word 
which would be obvious from the sentence structure even to people who 
did not understand the main point of the frame. Both forms were identi- 
cal in all other respects. The study was performed twice, once with 
graduate students and once with undergraduates. The results were 
virtually identical for both groups. On the average the error rate on 
the program was lower in the Trivial-word form (4.2%) than in the Key- 
word form (6.3%). However, students taking the Key-word form produced 
significantly higher learning both on criterion tests administered two 
days after learning and also on an alternate faum criterion test admin- 
istered two weeks after learning. The main conclusion is that reducing 
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the difficulty level of a program by asking for trivial responses on 

the program reduces the amount of learning also. 

The second experiment was designed to answer some rather obvious 
criticisms of the first study. After all, no one has ever advocated 
that a program ask for trivial responses. What would happen if every- 
one were asked to make the same critical "key-word" response but were 
given various kinds of cues to suggest the correct response? Three 
forms of the program, labeled Hard, Medium and Easy, were constructed. 
The Medium form consisted of a simple straightforward exposition of the 
main points with examples. The Easy form was identical except that 
additional content hints and cues were provided in order to make it more 
likely that the student" would answer correctly. The .Hard form was 
similar in that it still called for the same answer as the other two 
forms and provided the same information, but obtaining the correct an- 
swer was made more difficult by complicating the sentence structure, 
by adding steps necessary to solve the problems, by including irreler 
vant information, and by using less familiar terminology. So now we 
had programs written at three levels of difficulty. We added a second 
dimension to this study by adding typographical cues to half the pro- 
grams at each difficulty level. The typographical cues consisted of 
such things as giving the first and/or last letters of the correct 
answer, and providing underlining to indicate the number of letters in 
the correct answer. The error rate on the program itself indicated 
clearly that we had been successful in producing programs of different 
difficulty levels. For those programs which were written without 
typographical cues, the Hard, Medium, and Easy programs had difficulty 

34 










then, we found that typographical cues reduce error rate on the program 
but improve cirterion tests performance only with a difficult program. 
They interfere with criterion test performance on a medium difficulty 

level program. 

The third study was an attempt to define more precisely the speci- 
fic variations in the program which cause it to be difficult. Two 
methods of making the program more difficult were subjected to experi- 
mental manipulation and test. The first method consisted of adding to 
each frame in the program one item of irrelevant information, producing 
what we may call our Irrelevant program. An item of irrelevant infor- 
mation was defined as some statement which was related to the topic 
under consideration but which was not necessary in order to solve the 
problem or answer the question in that frame. The second method, 
resulting in our Complex programs, consisted of requiring the student 
either to go through an additional step in order to solve the problem 
or to read deliberately complicated wording in the frames. The Control 
programs were written in simple straightforward language without any 
unnecessary complications or irrelevant information. 

In order to obtain replication of results, three different content 
areas within educational measurement were programed by each of the three 
methods. Thus, nine different sections were prepared- -three different 
content areas written in each of three different manners. Irrelevant, 
Complex, and Control. Then by combining three sections at a time in 
various combinations in accordance with a Latin- square design, nine 
different forms of the material were assembled. Each student received 
a booklet which contained all three content areas and all three methods 
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of presentation, but the particular combination and order of method 
and content for a particular student was assigned to him according to 
an elegant Latin-square which corresponded to Lindquist's Type V mixed 
design. Analysis of the error rate on the program revealed that we had 
indeed prepared programs of different difficulty levels in all three 
content areas. On the average, the Control programs had an error rate 
of about 8 percent; Irrelevant programs had an error rate of about 18 
percent; and the Complex programs had an error rate of about 24 percent. 

The criterion test was administered two days after completing the 
program. The criterion test consisted of nine 5-item sub- tests. Each 
sub-test covered one of the three content areas and was constructed by 
one of the same three methods of writing. Irrelevant, Complex, or 
Control. The resulting four-dimensional analysis of variance revealed 
no significant differences on the criterion test attributable to the 
different treatments. In summary then, this study failed to reveal 
significant differences in learning in spite of the fact that reliable 
differences in difficulty level had been produced by either the adding 
of irrelevant information or the adding of complex working and complex 

problems. 

The fourth study was an attempt to determine whether variations in 
multiple-choice type programs might produce different results than con- 
structed response programs. Within the multiple-choice programs we 
constructed variations along three dimensions: (1) the plausibility or 

implausibility of the alternatives, (2) the number of alternatives 
provided-either two or four, and (3) the form of the answer-either 
recording the letter of the correct alternative or writing out the 



correct alternative in its entirety. 



We found large consistent differences in the error rate on the pror 



gram between the plausible and implausible alternative programs. As you 
would expect, the program was more difficult when students were asked 
to discriminate between more plausible alternative (14 percent) . When 
the alternatives were less plausible, students made few errors on the 
program (4 percent) . The number of alternatives also made a slight 



difference in error rate with four-choice alternative frames proving 
slightly more difficult (10 percent) than two-choice alternatives 
(8 percent). The form of the answering, whether by letter or by writing 



out the entire answer, did not produce significant differences in diffi- 



culty level. 

Two types of criterion tests were prepared. One was a multiple- 
choice criterion test and the other was a constructed response criterion 
test. Alternate forms of each were prepared, one administered immedi- 
ately after learning, the other two weeks later. The results of the 



criterion tests as based upon a preliminary analysis were not highly 
significant. On the immediate multiple-choice criterion test, writing 
out the answer proved to be more effective than writing merely the 
letter of the correct alternative (p < .05). Four alternatives proved 
better than two alternatives (p < .10). There was a tendency for the 
plausible program to produce better responses than the implausible but 
this did not reach conventional levels of significance. On the con- 
structed response immediate test, none of the differences approached 
conventional significance levels. On the delayed multiple-choice test 
two weeks later, the plausible alternatives program appeared better 
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than the implausible (p < .20). On the delayed constructed response 
program, the plausible program appeared better than the implausible 
(p < .15). All other differences were non- significant . 

While none of the differences in this study proved to be very 
large, it was interesting to note a consistent thread throughout all of 
them. The condition that produced the higher error rate also tended to 
produce better performance on the criterion tests. Thus, for example, 
providing more plausible alternatives on the program caused students to 
make more errors on the program but tended to improve their performance 
on the delayed constructed response criterion test. Requiring students 
to discriminate among four alternatives rather than two produced a 
higher error rate but increased their performance on the immediate 
multiple-choice criterion test. Asking students to write out a complete 
answer rather than merely writing the letter of the correct answer 
caused students to perform better on the multiple-choice immediate 
criterion test even though it did not affect their error rate on the 

program. 

Although the results of all these experiments will need to be rppli- 
cated by others before we can assign much confidence to them, it is 
interesting to summarize the trends that have been revealed so far. We 
can construct programs in many ways in order to vary their difficulty 
level. Some of these ways increase criterion performance, some decrease 
criterion performance, and for some we have been unable to demonstrate 

any effect in either direction. 

1. What factors affect difficulty level without appearing to 
affect criterion performance? Adding irrelevant information and 
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requiring more complex reading and problem solving increases error rate 
but we have not found a significant effect on criterion performance. 
Providing additional clues and hints decreases error rate but has not 

been found to affect criterion performance. 

# 2. What factors increase the error rate and also increase criteri 

on performance? Increasing both the error rate and criterion perform- 
ance may be done by asking the student to discriminate among plausible 
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Appendix A 



Inductive Mixed Arrangement of 
Programed Booklet 
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Name of student 



PROGRAMED BOOKLET 
Interpreting Test Results Part 1 



Form IND-M 



John D. Krumboltz 
William W. Yabroff 
-Stanford University 



Note: Read Instruction* Carefully 



DO 



NOT WRITE IN THIS BOOKLET 
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How to Use this Programed Booklet 



You ere about to begin studying by a new method. Programed 
books ere not arranged like ordinary books. To use this booklet, 



follow these steps carefully: 



| 1. Turn to page one. Read the words in the top box (Frame l) 

only. Think of some response which completes the blank space 
[ in Frame 1. 

I Write down that response on your answer sheet. 



I 



t 



2. Turn to page 2. Do not read any more on page 1 now. In 

the top left box on page 2, you will find the correct answer 
to Frame 1. 



3. If your response agrees with the correct answer, write a 
plus sign (+) next to your response on the answer sheet. 
If your response does not agree, write a minus sign ( -) . 



U. Now read Frame 2 at the top of page 2. Write down a response 
to Frame 2 on your answer sheet and score it + or - according 
to the correct answer given at the top left of page 3* 



5. Continue in this manner to the last page of the booklet. The 
answer to this frame is found back on page 1 in the next to 
the top square on the left hand side. 

Be sure you always write down your answer before looking at 
the answer in the booklet. 



After completing all the frames in the booklet, return the booklet 
and the answer sheet to your instructor. 



DO NOT WRITE IN THE PROGRAMED BOOKLET 
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Important 

The following s ymb ols tell you what kind of a response is needed. 

1. The number of words needed to complete an item is indicated 

by the number of blanks. Thus, indicates a one-word 

response, whereas __ _____ __ indicates a two-word response* 

When you see three asterisks ( * * * ) , you are to use as many 
words as you think necessary to respond to the item. 

2, The number sign (ffi) indicates that the desired response is 

a number. 

3 # The abbreviation (IT) calls for a technical term. When it is 
used, a nontechnical word is wrong. 

4 . often several responses are equally, good , even though all of 
them are not listed in the answer box. This is particularly 
true when the response is nontechnical. Use reasonable 
judgment in dec idihgjwhether your response means the same 
as the correct answer. Score it correct if it does. 



XX) not WRITE IN THE PROGRAMED BOOKLET . 
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(l) Items 

(questions ) 



(2) group 
CONTINUE ALONG 



SECOND ROW 

24 



On the 5th grade spelling test, George 
spelled 20 words correctly. George* s 
raw score is 20. Mary spelled 27 words 
correctly. Mary*s raw score is __ (#). 



DO NOT READ ANY MORE ON THIS PAGE NOW. TURN 
TO THE’ T0P. :0F PAGE 2 TO CONFIRM YOUR ANSWER. 



By giving the percent of the scores that 
fall below a given score, a percentile 
score tells the relative standing of a 
(l) with some (2) 



25 



( 1 ) 6 

(2) the same 




Suppose that Philbrite obtained an unusual 
score of 27 rather than the usual score of 
11 for a bright 3rd grader, so that scores 
for the 3rd grade class were: 

Nancy 1 
Jack 5 
Roger 7 
Philbrite 27 

The mean and median scores are (the same, 
dif f e vent ) . 



49 



To avoid the misunderstandings which arise 
from the misuse of grade equivalent scores, 

(1) (TT) scores (properly 

used) would compare the performance of an 

( 2 ) with the performance of 

TIT 



some 



or aspires to belong. 



to which he belongs 



11 



Frank obtained a raw score of 11 on the 
Delta reading tefrt. Frank* s percentile 
score is (l) (#). How many 

7th graders in the' standardization sample 
scored less than Frank? ( 2 ) (#). 
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ANSWER 27 
NOW READ FRAME 
2 RIGHT HERE — f 



A raw score of 18 means that an Individual 
obtained _ (#) correct answers on 

a test. 



NOW TURN TO THE TOP OP PAGE 3 TO CHECK YOUR 
ANSWER. 



(1) person 

(2) group 



25 



different 



Trudy took the College Entrance Examination at 
Little Run College and at Honors College. She 
obtained the following scores: 

Little Run College Honors College __ 

percentage score - 85 percentage score = 90 
percentile score = 88 percentile score — 75 
At which college did Trudy spore highest among 
the other applicants who took the Entrance 
Examination? — .• 2 g 



Two types of average scores are (l) in) 

and ITS) scores which have about the 

sam e - value Tn a (2 ) , distribution. 

In an unusual distribution or scores , the 
mean is (3) * * * ) 



m 
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50 



(1) percentile 

(2) individual 

(3) group 
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( 1 ) *0 
(2) 400 



Suppose you wished to know how well 5th gra- 
ders perform on the Sanford Achievement Test. 
One way would be to give this test to the 
entire 5th grade population and compute an 
average score. However, it is seldom if ever 

possible to test the entire (TT) of 

5th graders. 



74 



A 7th grade boy would have to obtain a 
raw score of OJO to score at the 

median of the standardization sample. 
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1 ' 

1 1® 

# 


The number of correct answers 
test is known as the 


obtained on a 
(TT) score. 


CONTINUE READING 

1 
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THE TOP FRAMES ON 
EACH PACE - - ^> 



NEVER READ TWO FRAMES CONSECUTIVELY ON THE 
SAME PAGE 



Little Run 
College 



Honors College admits 570 new students a year, 
and rejects 64 percent of their applicants on 
the College Entrance Examination, The lowest 
acceptable percentile score earning admis- 
sion would be (#). 



26 



(1) mean, median 

(2) usual, (or 
typical } 

(3) more affected 

by extreme 
scores (or 
words to that 
effect ) _ 0 



27 



Below are the results of Miss Reallybusy 1 s find- 
ings for grades 3* 5* and 7 on the Jones 
Arithmetic Test: 

Grade 3 Grade 5 Grade 7 

Mean = 6 Mean = 11 Mean = 16 

Median = 6 Median = 10.5 Median = 16.5 

Both the mean and median scores for each grade 
report two kinds of 

51 



population 



It is possible to give the Sanford Achievement 
test to different 5th grade classes, or samples 

of 5th graders. By taking different (1 ) 

(TT) we would be able to describe how well 
5th graders as a whole, or technically the 5th 
grade (2) (TT) might be expected to 

perform on the te " sf . 



13 
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Thus, norms allow us to compare an (l ) 

score to the scores of the (2) 

(TT) from which the norms were 

computed . 
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CONTINUE 

raw 



Frank obtained a raw score of 20 on a 50-1 tem 
English test. His percentage score is ob- 
tained as follows: 

20 x 100 = 40 
50 

Mary obtained a raw score of 10 on the same 
test. Her percentage score is (#). 



6 “ 



3 



A score that gives the percent of the scores 
falling below a given score is called a 

(1) (TT) score, which tells the 
relative standing of a person with some 

( 2 ) . 



4 







averages 



If Philbrite in the 
unusual score of 27 



28 




Miss Reallybusy would have found the following: 
Grade $ Grade 5 Grade 7 

Mean = 10 Mean = 11 Mean = 16 

Median *» 6 Median = 10.5 Median =16.5 

The mean score for the 3rd grade is now very 
close to the mean score of the (l ) g rade. 

The (2) score for the 3rd grade has not 

changed in spite of Philbrite f s performance. 



(1) samples 

(2) population 



In this case, the various 5th grade classes or 
the (TT) of 5th graders are selected 

from all possible 5th graders, to find out 
how well 5th graders in general might be ex- 
pected to perform on the Sanford Achievement 
Test. 



1 
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(1) individual* s 

(2) standard- 
ization sam- 



Now turn to Panel #3, page 27. Panel #3 

reports (l) (TT; norms for a 

standardization sample of ( 2 ) (#) 7th 

grade girls on the Delta Reading Test. 



pie 




100 



V 







20 (10/50 X 
100 = 20) 


Larry Bright missed two answers on a 75-item 
quiz. Larry’s raw score is (l) (#)• 

His percentage score is obtained by dividing 


N, 

/ 


75 by ( 2 ) \rr ) then multiplying oy 

ALWAYS TURN TO THE NEXT PAGE 


4 


5 


(l) percentile 


In the following test scores: 


(2) group 


12 

13 

15 

% 


28 


the score at the median is 15* There are 
(#) scores above and below 15. 


(1) 5th 


If Miss Reallybusy reported that the 3rd grade 
performed as well as the 5th grade on the 


(2) median 


Jones Arithmetic test j wnsit Kind 01 sverege 
score would she have used in this report? 
(TT) 


52 


53 


samples 


Thus, a (l) (TT) is used to des- 

cribe or obtain information about a (2) (TT) 

and is comprised of subjects selected from 
that (3) (TT). 


76 


77 


(l) percentile 


A girl would have to obtain a score of at 
least (#) to fall at the 95th 


(2) 800 

100 


percentile. 

81 

101 



100 
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(1) 73 

(2) 73 

AND SO ON — 



-> 



(1) sample 

(2) population 

(3) population 



77 




29 



101 



A percentage score Is the (l) w 



(TT) of 



correct answers on a test, whereas a raw 
score Is the (2) of correct answers. 



YOU WILL NEVER GET LOST IP YOU FOLLOW THESE 
FRAME NUMBERS ^ 



6 



In scores 



7 

10 

11 

l4 

19 



the score at the median Is 



(#). 



30 



In a typical distribution of test scores, the 
( 1 ) (TT) and (TT) have 



about tne same value. The (2) (TT) 

is affected more by an occasional extreme 
score • 



54 



A foreign observer found that high school stud- 
ents in Southville County averaged one hour 
of homework each night. He was asked to make 
a statement about the amount of homework done 
by high school students in the United States. In 
this case, the sample used was high school 
students in (l ) . The population to 

be described was high school students in 

( 2 ) . 

78 



Mary obtained a raw score of 20 on the Delta 
Reading test. How many girls in the stan- 
dardization sample scored below Mary? 

(#)» 



82 
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(1) percent 

(2) number 



Susan answered 16 out of 20 test-items 
correctly. Her percentage score is 
(#)• 



CORRESPONDING 
ANSWER NUMBER ^ 



11 



30 



(1) mean, median 

(2) mean 



34 



(l) Southville 
County 



(2) the United 
States 



78 



600 



102 



When scores are not arranged in order, such as 
3 9 8, 4, 11, the median is round by ranking 
the scores from lowest to highest as follows: 

Scores Rank Order 

“1 — n 

! ! 

11 3 

The median Is (#)• 21. 



Books read per month in Southville School 
7th grade 8th grade 

2 students read 25 3 students read 3 

13 students read 1 14 students read 1 

5 students read 0 3 students read 0 

What kind of average score would report both 
classes to have read an equal number of books 

per month? (l) (TT). This score indicates 

that each class read (2)__ (#) books per 

month. 



Before making a statement about the amount 
of homework done by high school students in 
the United States, it would be wise for the 
observer to have many bigh 

school students from different parts of the 

country • 



I 



I 



79 



Norms allow us to compare an individual's 
score on a test to the scores of the 

_(TT) from which 

the norms are computed . 



83 



103 
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(1) median 

( 2 ) 1 



55 



samples 



79 



8 



In an 80 question History exam. Prank obtained 
a percentage score of 70, His raw score 
was (#). If you have read this 

far, circle number 8 on your answer sheet like 
this: 



8 



A score is said to be at the median when 
( * * # ) 



32 



The Sguthville Daily Times wrote a stirring ed- 
itorial claiming that the 7th grade class in 
Southville school read almost three times as 
many books per month as the 8th grade class. 
Using the same figures as on the previous frame, 
what kind of average score did the newspaper use 
to make this observation? (l) (TT). This 

score indicated that the 7th grade read on the 

average of ( 2 ) books per month while 

the 8th grade averageo (3) books per mont^. 



It is seldom possible to obtain information 
by testing an entire population. It is 

possible to use (l) (TT) to describe 

certain characteristics of a ( 2 ) (TT). 



80 



standardization 
sample 



Refer again to Panels #2 and #3. In comparing 
raw scores needed to obtain a percentile 
score of 95 on the Delta Reading Test, would 
a boy or a girl have to obtain a higher raw 
score to be at the 95th percentile? • 



84 
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56 


To obtain a percentage score, divide the 
(l) (TT) score by the (2) 

number of test items, then multiply by 

(3) (#)• 


8 


9 


there are an 
equal number of 
scores above and 
below that score, 
(or words to 
this effect) 


Arrange the following even number of scores 
in rank order : 

Scores 1 Rank Order 

19 

27 

12 

15 


CM 

m 


What number would represent a median score? 

(#) 33 


(l) mean 


The mean and median are two kinds of averages 


( 2 ) 3 or 3.15 


which may be the same (or close together) in a 


( 3 ) l or 1.15 


usual set of scores, but may be 
in an unusual set of scores. 


56 


57 


(l ) samples 


He might find that students in Northville average 
3 hours of homework per night, while students 


( 2 ) population 


in Westville averaged 2.5 hours etc. It is 
obvious that these samples are 

If you have read this far, circle number 8 l 
on your answer sheet. 


80 


81 


girl 


Prank and Susan both obtained raw scores of 

21 on the Delta Reading Test. When compared 

to their standardization samples, who obtained 

the higher percentile score? 

85 


104 


105, 
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(1) raw 

(2) total 

( 3 ) 100 



(1) 27, 19, 15, 
12 



(2) 17 



33 



different 



57 



different 



81 



Susan 




105 



Prank * s percentage score of 70 on the 80- 
item History Quiz enables us to compute the 
(l) of correct answers he obtained 

on the test • Do we know how well Prank 



did in comparison to other members of his 
class? (2) • 



10 



In the same set of scores: 
12 15 19 27 



the median of 17 is half-way between the two 
middle-most scores, 15 and 19* The median 
of 17 is a midpoint at which (#) 

scores fall above and below tne median* 



34 



Turn to Panel #1 on page 25* Which 7th 
grade student obtained a raw score equivalent 
to the mean of the 7th grade? (l) 

( 2 ). 



Thus, a raw score of (2) would 

have a grade equivalent of 7, because it is 

equal to the (3) (TT) score of the 

7th grade. 



58 



Since it would be practically impossible to 
know the homework time spent by every high 
school student in the United States, the 
sample of students in Southvllle might be 
compared with other (TT) of 

students throughout the country. 



82 



If you were assigning an A to boys and girls 
at or above the 90th percentile on the Delta 
Readlnt Test, what minimum raw score would a 

girl have to obtain? (l) (#) A boy? 

( 2 ) 



.(#) 

86 






106 
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(1) number 

(2) no 



(1) James 

(2) 30 

(3) mean 



samples 



11 



10 



34 



58 



82 




In the 4th grade class of 30 pupils, Trudy 
answers 15 out of 30 test-items correctly, 
Trudy's percentage score is (l) (#)• 



Do we know how well Trudy did in comparison 
to the other 30 pupils? (2) . 



11 



Arrange the following even number of scores 
in rank order and find the median: 



Scores 

9 

3 

13 

47 

The median is (l ) 
most scores are f§7 



Rank Order 



47 



(#). The two middle* 
’ and (#)• 



33 



A student with a raw score of 15 on the 
Sanford Achievement test did as well as the 
average of the (l) (#) grade, and has 



a grade equivalent of 3* because 15 is the 
(2; (TT) score made by 3rd graders. 



59 



Thus, a group of subjects selected from a 
population for the purpose of obtaining in- 
formation or describing that population is 
known as a (TT), 



83 



Thus, when standardization samples (l)_ 



significantly from each other, more than 

one set of (2)_ (TT) should be given 

for a test. 



87 
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f 



s 



i 






1 












12 



(1) 50 



The percent 



of correct answers obtained on 



(2) no 



a test, found by 



raw score 
total score 



x 100 



is the 



(TT) score. 



Rank Order 

(1) u 

(2) 9, 13 



(1) 3rd 

(2) mean 



11 



47 

13 

9 

3 



12 

Thus, the median in an even number of scores 

is (l)_ between the two (2) 

scores. 



35 



A student who obtained a raw score of 20 
did as well as the average of the (l ) 
grade, and has a grade equivalent of 






sample 



59 



6o 



Miss Reallybusy reported that her 3rd grade 
class norm on the Sanford Achievement Test 
was a mean score of 43. The standardization 
sample from which the class norm of 45 
was computed is the (l) . 



83 84 

(1) differ 

(2) norms 



88 




It is useful to have separate (l) (TT) 

for boys and girls on the Delta Reading 

Test because the two sexes make significantly 

( 2 ) r— sc °res. If you have read 

this far, circle number 108 on your answer 
sheet . 












percentage 



12 



(1) halfway 

(2) middlemost. 



36 



(1) 5th 

( 2 ) 5 



60 



3rd grade 



84 



(l) norms or 
percentile 
norms 



(2) different 



108 




On the World Geography Test, 15 out of 20 
students (75 percent of the class) scored 
lower than George. His percentile score is 
75. Ten students scored lower than Susan. 
What percent of the c3a ss scored below 
Susan? (l) (#). Her percentile score 

is ( 2 ) (#)• 



13 



In scores 3> ^ 7 9 8, 11, 11, 12, 

applying the definition you have already 
learned about the median, would you count 
the duplicate scores 444 and 11 11 as 

separate scores? (l) • ^ per ”/it\ 

centile would the median fall? (2) \w )• 

The median for the above set of scores Is 

(3) (#)• 



37 



A grade equivalent indicates (* * * ) 



61 



The superintendent of Southville School Dist- 
rict reported that the Southville High 
School's senior class scored at the 83rd per- 
centile on the District Norm for the Sanford 
Achievement Test. Senior classes in the 
Southville School District were used as the 

(TT) for the 

district norm. 



81 



Might it be useful to have more than one 
set of norms for the Jones Arithmetic Test 
when comparing scores obtained by Miss 
Reallybusy's students in Northville and 
scores obtained by students in Par-behind 
School at Reservationville? • 



89 



109 



m 



£1 















14 









(1) 50 

(2) 50 



1U 

In the same class of 20 students. Prank 
obtained a percentile score of 90 on the 
World Geography Test, What percent of the 
class scored lower than Prank? ) • 



(1) yes 

(2) 50th 

( 3 ) 6 



13 



37 



which grade ob- 
tained a mean 
score closest to 



an individual's 
score (or words 
to this effect) 



14 



In the following set of scores: 



7 19 

7 100 

8 100 

15 100 

The median is (l)„ 
centile is (2) 



(#)^ The 50th per- 



38 



Since Miss Reallytusy administered the Sanford 
Achievement test at the beginning of the 
school year, a raw score of 15 is equivalent 
to the mean of beginning 3$d traders. Thus, 
the (TT) may be more 

precisely expressed as 3*0 • 



6l 



standardization 

sample 



85 



yes 



62 

The test publishers of the Sanford Achievement 
test report a table of national percentile 
norms for high school seniors. What stan- 
dardization samples were probably used to 
compute these norms? (* * * ) 

§6 

More than one set of norms should be given 
for a test when the standardization samples 

significantly from each other. 



109 



90 



110 
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90 



Thus, you might say that a peroentilo score 
is the percent of ( # * * )♦ 



14 



(1) 17 

(2) 17 



15 



In Prank’s class of 24 students, Frank ^ 
scored 18 on the Know Your Geography Quiz. 

12 students scored less than Prank* The 
median score for Prank’s class is ( 1 ) \fr ) * 

At what percentile did Prank score? 

( 2 ) (#). 



38 



grade equivalent 



62 



senior classes 
throughout the 
nation (or 
words to that ef * 
feet) 



39 

If Miss Reallybusy had administered the test 
after two months of the school year had passed 
and found that 15 was the mean raw score made 
by 3rd graders, then 3*2 would be the . 

(TT ) • , 



The sample of subjects on whose scores the 
t^y (tt) of a test are computed is 

known as a (2) . .-(TT)* 



86 



differ 



110 



87 



Susie obtained a raw score of 58 on the Col- 
lege Entrance Test. When she applied for 
admission to Outeast, Somefun and Honors 
colleges, she learned that her raw score of 
58 earned a different percentile score at 
each college. Susie's raw score was compared 
with at least (l) (#) different sets 

of (2) ,(ffT 



V 
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^v^smtpsm 



16 



the scores that 
fall below a 

f lven score. 

or words to 
this effect) 



Turn to Panel #2 on page 26. The raw score 
of 13 would have a percentile score of 
(#) on the Delta Reading Test. 



(1) 18 

(2) 50th 



15 



16 



39 



A score Is said to be at the median when there 

are an (l) number of scores above and 

below that score. The median Is at the 
(2) (#) percentile. 



40 



grade equivalent 



If a raw score of 32 had a grade equivalent of 
7.6, we would know that 32 was the (l) (TT) 

score obtained by the (2) grade after 

(3) (#) months of school had passed. 




(1) norms 

(2) standardlza 
tlon sample 



Miss Reallybusy's 3rd grade class norm on 
the Sanford Achievement test was a mean 
score of 43. The class norm computed from 
the scores of the 3rd grade standardization 
sample Is the (l) (TT) score of 

(2) (#). 



64 






■1 









87 

(1) 3 

0 

(2) norms 



111 



0 

ERIC 
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Susie's percentile scores for the College 
Entrance Test were as follows: Outcast College, 

64; Somefun College, 72; Honors College, 45* 

At which college would Susie compare most 
favorably to other applicants for admission? 

(l) . Least favorably? (2) I 




II 



On Panel #2, what percent of the scores are 
below the 60th percentile? (l) (#). What 

percent of the scores are below the raw 
score of 10? (2) (#). 



(1) equal 

(2) 50th 



16 



17 



The mean of scores 2 f 4, and 9 is obtained 
as follows: 



2*»4 ±9 

3 



(#) 



(1) mean 

(2) 7th 

( 3 ) 6 



40 



41 



A grade equivalent indicates which (l ) 
obtained a (2) (TT) score closest 

to an individual score. 



(1) mean 

(2) 45 



64 




The percentile score of 83 obtained by the sen- 
ior class of Southville High was found by 
comparing the scores of the senior class to 
other senior classes in the Southville High 
School District. The percentile scores of 
these standardization samples constitute 
a set of district (TT) for the 

Sanford Achievement Test. 



(1) Somefun 

(2) Honors 
College 



Thus, a test which provides many sets of 
(l) (TT) based on (2) 



standardization samples allows more useful 
(3) of a person* s score. 




112 



93 

113 
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18 



5 

1 



( 1 ) 60 

(2) 35 



XL 



41 



(1) grade 

(2) mean 



Thus* a percentile score is the (l). 
of the scores that fall ( 2 \ 



.(TT) 



a 



given score. 



18 . 



The mean for scores 



3 

4 

6 



7 

10 



Is obtained by dividing 30 by (l). 
The mean is (2) (#)• 



J#). 



42 




norms 



89 






(1) norms 

(2) different 




g! 



^ , comparison 
or evaluations) 



113 



On Panel #1, page 25, Fidel (5th grade) ob- 
tained a raw score of 15# His grade equiva- 
lent is (l) (#)* because he did as 

well as the average of the (2) grade. 



Would you say on the basis of his grade 
equivalent score , that Fidel is two years 
retarded ? (3 ) • 



66 



In reporting percentile scores obtained by 
senior classes in high schools through the 
Nation, the test publishers of the Sanford 
Achievement test were reporting a set of 

national (TT) for that 

test. 






90 



The Know Your Literature Test reported a set of 
percentile norms for graduate students at 
Oxford, England. Miss Earnest administered 
the test as a final examination to the Junior 
Literature class at Southville High and 
found that 95 percent of the students scored 
below the 10th percentile*. ”A whole year of 
teaching wasted," she concluded. This test 

(did, did not) provide useful 

comparisons of scores. 114 






















Ip 



(1) percent 

(2) below 



In a 40-item English test. Prank obtained a 
percentage score of 90. Do we know Prank’s 
relative standing with the other members of 

his class? • 



18 



(1) 5 

( 2 ) 6 



19 



The mean is obtained by ( * * * )• 




(1) 3.0 

(2) 3rd 

(3) no 



Since grade equivalents are based on mean 
scores, we can expect that approximately 
(#) percent of a class will fall 
BelovTthe class mean. Should these students 
be compared in their performance to the 
performances of students in lower grades? 



67 




percentile norms 



Thus, the scores obtained by standardization 

samples on a test constitute the (TT) 

of the test. 



90 



91 



did not 




A reporter for the Southville Daily Times wrote 
that the performance of the Junior Class ° n 
much acclaimed "Know Your Literature test 
proved that education in Southville High was 
only one-tenth as good as English Education l 



The lack of appropriate (TT) for the 

"Know Your Literature" test led this reporter 
to a misinterpretation of test scores. 



114 



115 



95 















20 



o 

c 

• 


If we knew Prank* 8 percentile score on the 
40-item English test to be' "TO, do we now 
know Frank *s relative standing with his 
class? (1) . What percent of the 

class scored beio*T>rank? (2) (#). 


; 

1 

j 19 


20 


1 dividing the sum 

of the scores by 
I the number of 

| scores (or 

f words to this 

I effect ) 


In Frank's class of 20 students, 12 obtained 
raw scores of 10, and 8 students obtained 
raw scores of 5 on the Friday Vocabulary 
Quiz. Frank decided to compute the mean 
for his class as follows: 

10 ir 5 — .75. Is this the correct mean 

for Frank's class? 


\ 43 


44 


50 

I 

| 

I' 

lift 


Imagine being told that you could solve a 
puzzle as fast as the average monkey. You 
might well reply, "That would be valuable 
information if I were a monkey, but how 
did I rate in relation to of my 

own age, sex, and educational level? 


! 67 


68 


1 norms 

| 


Turn now to Panel #2, page 26. What 
standardization sample was used for the 
Delta Reading Test? 


1 

I 

! 91 

f 


92 


I norms 

f 


Miss Earnest should have chosen a literature 
test which provided many sets of (l) (TT) 

based on different (2) (TT) 

other than Oxford Graduate Students. 


j 115 


96 

116 


I 




1* o 

1 ERLC 


J ’ ’ 
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21 



(1) yes 

( 2 ) 80 



20 



By giving the percent of the scores that fall 
below a given soore, a percentile score 
tells ( * * * ) 



21 



I 



no 



In Computing the mean of ,75 on the previous 
frame , Prank failed to find the ___ of 



the scores. The correct mean for Prank's 
class on the Friday Vocabulary Quiz is 8, 



44 



45 



persons (humans) 



Grade equivalents are widely used and misused in 
education. The trouble with grade equivalent 
scores is that they may compare the perform- 
ance of an (l) to the perform- 
ance of some (2) to which he (does, 

does not) (3) 



belong. 



68 



69 



7th grade boys 



On Panel #2, the scores obtained by 
7th grade boys on the Delta Reading Test 

constitute a set of (TT) 

for that test. 



92 



93 



(l) norms 



(2) standardiza- 
tion samples 



A test which provides many sets of norms 
based on standardization sam- 



ples allows for more useful comparisons of 

a person's score, 

97 



116 



117 















o 

ERJC 













22 



the relative stan-| 
ding of a person 
with some 
group (or words 
to this effect) 



Would It be possible for a person to answer 
95 percent of the Items correctly on a test 
and only obtain a percentile score of 20? 



21 



22 



sum 



The mean is the (l). 
divided by the (2)_ 



of the scores 
of scores . If 



you have read this far, circle number 46 
on your answer sheet. 



45 



46 



(1) individual 

(2) group 

(3) does not 



In Miss Reallybusy's 7th grade class (Panel #l) 
Hilda's raw score of 24 might be reported 
either as a grade equivalent of 5*8 or as 
a percentile score of (#) • 



69 



70 



percentile norms 



The letter 



in Panel #2 points to 



the number of subjects in the standardization 



sample. 

(Hint: Remember, there were 1,000 in the 

sample ) 



93 



94 



different 



Congratulations'. You have Just finished your 
programmed instruction booklet. Please 
answer the questions found on the last page 
of your answer sheet. 

o 8 Thank you. 



117 



118 
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! 




yes 
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(1) sum 

(2) number 



20 



N 







46 



70 



94 



118 



Again, would it be possible for a person to 
answer only 30 percent of the items correctly 
on a test and obtain a percentile score of 

85? • 



23 



Miss Reallybusy teaches grades 3 , 5# and 7. Be- 
low are raw scores for grade 3 on the Jones 
Arithmetic test: 

Nancy 1 Roger 7 

jack 5 Philbrite 11 

To find the average score for the 3rd grade. 

Miss Reallybusy computed the median score. The 
3rd grade median is (l) (#)• She 

might also find another kind of average by 
computing a (2). (TT) score. 47^ 



In reporting Hilda* s raw score of 24 as a 
percentile score of 20, we are comparing her 
performance to the performance of the 7th 
grade, a (l)^ to which she (2). 



71 



Thus, the (l) 



(TT) of a test are 



computed from scores obtained by the 
(2) (TT) on that test. 



95 



99 



•&trr 



I 



» 



I 












yes 
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(1) 6 

(2) mean 



47 



(1) group 

(2) belongs 



71 



(l) norms 



(2) standardiza- 
tion sample 



95 



24 



The percentage score on a test tells us 
the percent of the (l). answered 



correctly. The percentile score tells us 
the relative standing of an individual with 
some (2) 



ANSWER IS IN 2ND ROW DOWN, PAGE 1, TURN TO 
PAGE 1 AND BEGIN 2ND ROW. 



24 



The mean score for the 3rd grade is: 

Nancy 1 

Jack 5 

Roger 7 

Philbrite 11 Mean = (l)_ 



(#). 



The mean and median scores for the 3rd grade 
are (the same, different). (2), . 



48 



Jim, although still a high school senior, 
scored at the 85th percentile of freshman 
applicants to Honors College. The use of a 

(TT) score to report Jim's 
performance on the College Entrance Test 



compared him with a group to which he aspired 
to belong. 



72 



On Panel #2, a 7th grader must have a raw 
score of at least (#) to obtain a 



percentile score of 95. 
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Appendix B 



Panels Accompanying Programed Booklets 
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PANEL #1 PAPE 25 

Raw Scores on the Sanford Achievement Test 



Name 


Grade 3 
Raw Score 




Nancy 


10 




Jack 


13 


3rd grade mean 


Roger 


15 


15 


Philbrite 


22 




Fidel 


Total 60 

Grade 5 

15 




Mary 


18 


5th grade mean 


John 


20 


20 


Louise 


22 




Gene 


J5 




Gary 


Total 100 

Grade 7 
16 




Hilda 


2k 




James 


30 


7th grade mean 


Janice 


33 


30 


Don 


k2 





Total 



150 




PANEL #2 



PAGE 26 



Percentile Norms for 7th Grade Boys 
on the Delta Reading Test 
(N = 1000) 



Percentile Score 

99 

97 

95 

90 

85 

80 

75 

70 

65 

60 

55 

50 

45 

40 

35 

30 

25 

20 

15 

10 

5 

3 



Raw Score 

33 or higher 
29 - 32 
26 - 28 
24 - 25 
23 
22 
21 

19-20 
17 - 18 
15 - 16 
14 
13 
12 
11 
10 

9 

8 

7 

6 

5 

4 

3 



1 
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0-2 



PABEL #3 



PAGE 27 



j 

I 



Percentile Norms for 7th Grade Girls 
on the Delta Reading Test 
(N = 800) 



Percentile Score 



Raw Score 



99 37 or higher 

97 34 - 36 



95 

90 

85 

80 



29 - 33 
25 - 28 
23 - 24 
21 - 22 



75 

70 

65 



20 

19 

18 



60 



17 



55 

50 

45 

40 



16 

15 

14 

13 



35 

30 

25 

20 



. 12 
11 
10 

8 - 9 



15 



7 



10 



6 



5 

3 

1 



5 

3 - 4 

0-2 




I 

i 

f; 

I 

i 

V 

I 

I 
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Appendix C 



Answer Sheet 



IND-M 



IND-M 



ANSWER SHEET 



Interpreting Test Results Part 1 
Form IND-M 



Keep a record here of the time you spend on this program. 



Date 



Time at Time at Number of 

Start Finish Minutes spent 



Total time 



Name 

Year in college (circle one) 1 2 3 ^ 



MAS 

(leave blank) 
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Page 1 IHD-M 



1. 


+ or - 
( ) 


2. 


( ) 


3. 


( ) 


4 . 


.( ) 


5. 1) 


( ) 


2) 


( ) 


6. l) 


( ) 


2) 


( ) 


7 . 


( ) 


8 . 


( ) 


9. 1) 


( ) 


2) 


( ) 


3) 


( ) 


10. 1) 


( ) 


2) 


( ) 


11. 1) 


( ) 


21 ( ) 


12. 


( ) 


13. 1) 


( ) 


2) 


( ) 


14. 


( ) 


15. 




( ) 


16. 


( ) 



+ or - 



17. 1) 


( 


) 


2) 


( 


) 


18. 1) 


( 


) 


2) 


( 


) 


19. 


( 


) 


20, 1) 


( 


) 


2) 


( 


) 


21. 














( 


) 


22. 


( 


) 


23. 


( 


) 


24. 1) 


( 


) 


2) 


( 


) 


25. 1) 


( 


) 


2) 


( 


) 


26. 


( 


) 


27. 


( 


) 


28. 1) 


( 


) 


2) 


( 


) 


29. 


( 


) 


30. 


( 


) 


31. 


( 


) 


32. 













.( ) 
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Page 2 IND-M 



33. (Rank Order) 


+ or - 
( ) 


46. 


+ or * 
( ) 




( ) 


47. 


( 


) 


34. 


( ) 


1*8. 1) 


( 


) 


35. (Rank Order) 


( ) 


2) 




) 


i) 


( ) 


49. 


( 


) 


2) 


( ) 


50. 1) 


( 


) 


36. 1) 


( ) 


2) 


( 


) 


2) 


( ) 


3) 






37. 1) 


( ) 




( 


) 


2) 


( ) 


51. 


( 


) 


3) 


( ) 


52. 1) 


( 


) 


38. 1) 


( ) 


2) 


( 


) 


2) 


( ) 


53. 


( 


) 


39. 1) 


( ) 


54. 1) 


( 


) 


2) 


( ) 


2) 


( 


) 


1*0. 1) 


( ) 


55. 1) 


( 


) 


2) _ 


( ) 


2) 


( 


) 


41. 


( ) 


56. 1) 


( 


) 


42. 1) 


( ) 


2) 


( 


) 


2) 


( ) 


3) 


( 


) 


43. _ _ 




57. 


( 


) 






58. 1) 


( 


) 




( ) 


2) 


( 


) 


44. 


( ) 


3) 


( 


) 


45. 


( ) 


59- 1) 


( 


) 






2) 


( 


) 
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On the following questions, place an "X" next to the phrase which best 

describes your personal reaction to the programed booklet. 

1. I feel the material contained in thi6 booklet was 

_ __ a) very interesting. 

_____ b) of some interest. 

c) of little or no interest. 



2. If I become a teacher, I think that knowing how to interpret tests 
would be 

_____ e) absolutely required. 

_____ b) extremely important. 

_____ c) of some importance. 

d) of little importance. 



3. Think back to the sequence of definitions and problems as they appeared 
in your booklet. In learning the material, this sequence was 

_____ a) very helpful. 

_____ b) of some help. 

c) of little or no help. 



U. The sequence of definitions and problems as they appeared in this 
program 

a) forced me to think most of the time. 

___ b) required some thought on my part. 

' c) enabled me to get the correct answer without thinking very hard. 
d) required practically no thought on my part. 



5. I would have learned the material better if 

_____ a) I could have solved problems before learning the rule. 

~ b) I could have learned the rule before solving problems, 
c) (the order was satisfying as it was). 

6* In comparing this booklet with the usual textbook way of learning new 
material, I felt I learned 

___ a) more efficiently from the booklet. 

____ b) as efficiently from the booklet as from a typical textbook. 
_____ c) less efficiently from the booklet than from a typical textbook. 

PLEASE RETURN YOUR BOOKLET AND ANSWER SHEET TO THE INSTRUCTOR. 

Name^^ student 
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(please sign here) 
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Appendix G 



Criterion Tests 



I 





CRITERION 



TEST 



FOR 

INTERPRETING TEST RESULTS 



John D. Krumboltz 
William W. Yabroff 

-Stanford University- 



Name of student 




Section R 



Hr me of student 

Exact time at beginning 
of this section is 



Answer the following questions in the spaces provided. 

1. To compute the percentage score on a test, one must know the number 
of correct answers obtained and also the 



2. In a series of scores where the mean and median are obviously dif 

ferent, extreme scores affect the ( l) „ more then 

the (2) .• 



3. When information is sought on 'arge numbers of subjects, and it is 
not feasible to examine each subject, investigators use what is 
known as a to describe the population in- 

volved. 



k. The percentage score and the raw score are two ways of reporting 
test results. The difference is that a ; percentage score tells 

( 1 ) — 

whereas a raw score gives ( 2) — — 



5. An individual's rank in some group is expressed by a 
score. 



6 . 



When a student is assigned a grade equivalent score, his performance 

on a test is compared with a (l) that obtained a 

/p\ score closest to his score. 



7. A percentile score may be defined as 



115 




















.i-i .L’J t. -U jr i. - Jl ■■ ~ , 



8 . 



10 . 



If you had an even number of teat scores, no two of which were 
identical, could the mean fall at one of these scores? (l) 
Could the median fall at one of these scores? (2) < 



9. The median is defined as 



In comparing the raw score of a student with the mean score obtained 
by some class, grade equivalent scores may create a false impression. 
A student may score at a grade equivalent above or below his correct 
grade placement. Two factors about grade equivalent scores which 
account for this are: 



a) 



and 



( 2 ) 



11. If you wished to compare an individual's score with a group to 
which he either belongs or aspires to belong, you would refer to 
norms reported in a table of ^ scores. 



12. Which type of average score always corresponds to the 50th 



percentile? 



13. A mean score on a test is obtained by 



Ik. The norms of a test are computed from the scores of subjects known 



as a 



THE TIME AT THE COMPLETION OF THIS SECTION IS 
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Section A 



Name of student 



Exact time at beginning 
of this section is 



(Use space in the margin for computation if needed.) 

Questions 1 through 5 are based on the table below which gives the names 
and raw scores of all 7th grade students who took the 50- item history 
examination at Midville Junior High* 

Student Raw Score 



Adele - ■ 

Roscoe - 
George - 
Betty - 
Katherine 
Harry - 
Jeff - - 
Irvin - 
Arthur - 
John - - 



5 6 
46 
43 
4o 

33 

31 

26 

21 

19 

15 



Sum of raw scores * 33° 



1. What was Betty’s percentage score? - - 

2. Give the percentile score for Arthur. 

3. Give the percentile score for George. - 

4. To score at the median in this set of scores, one would have to 

obtain a raw score of _ * 

e For a student to have a grade equivalent score of 7*0, what raw 
score would he have to obtain if the test above was administered 
at the beginning of the school year? _ 

6. Suppose you have written a test of 25 items for your class of 20 
pupils, and you find that a raw score of 21 correct answers gave a 
percentile score of 30 • Generally speaking, was this a hard or 
an easy test for your class? -* 



7. A journal reports that 19 out of every 20 teenage American toys 
score 50 or below on a physical fitness exam. If the score of 50 
is a percentile score, the norms of the test (were, were not; 

computed from a representative sample of 






a). 

( 2 ) 
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8. To find & student* s per c entile score on the* Tee Test of Terting 

Ability, you ere instructed to compere his rev score to the entries 
in a table provided by the test publisher* This table gives the 

for the test. 



a After B months of school all the students ot East Hills Junior 
were given a test of paragraph comprehension. The mean row score 
obtained by the 8th grade was 68. Considering only the students 
fit East Hills, :: row score of 68 would correspond to * grade equiva- 
lent score of • 



10. Thirty- five out o'' .very forty college seniors score rr,-v- the 5 5th 
percentile on *• .:ure of computational ability. The norms ol 
this measure were o>t computed from the scores of represent * civ*. 

sample of — -* 

^ test score indicates that Student X is at the 86th percentile in 
reading speed when compared to graduate students. Thi6 means that 
Student X scored ( l) than percent of the 

in the standardization sample from which 



the test (3) vere computed . 

12. A reporter investigating farm labor conditions in Midville, found 
that the DeMarco farm employed migrant workers at $1.00 per day 
and farm-machinery operators at $20 per day. The migrant workers 
made up 64# of the labor force while the farm-machinery operators 
made up 36 # of those employed. The reporter concluded that the 
average wage paid to farm workers in Midville was $1.00 per day, 
and that legislation should be passed to increase farm wages. 

a. In computing the average salary, the reporter used what 
type of average score? _____ _ _ 

b. Whet was the sample used to describe >;b * *vrage wage of 

farm workers in Midville? - - 



In the following questions, circle the letter th. t r^rrerp onds to dV' 
phrase which best completes erch question. 



13. 



f]Rie average n umb er of miles driven each dry or ’/rip is -oni-T** led 
by dividing the total distance traveled by tin number of d\->s o;i 
the road. The type of average used is £ (eivclt on-:.) 



a. ) mean. 

b. ) median. 

c. ) neither of the above. 
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14. Before giving a test; to a class, we know that the number of people 
who will score above the median for that class will be (circle one) 

a) one-half the sum of the class scores. 

b) one-half the number of class members. 

c) about the same as those scoring above the mean. 



15 . Let us assume that urban populations score significantly different 
than rural populations on the Modern Vocabulary Test. In order for 
test results to be most useful, you should hope to find (circle one) 

a) two sets of norms, one based on a rural population, another on 
an urban population. 

b) one set of norms based on a population containing equal numbers 
of urban and rural subjects. 

c) either of the above would be equally satisfactory. 



IT. A seventh grade teacher discovered that half her pupils scored below 
average on a standardized test of spelling in spite of her year-long 
efforts to teach them to spell. She should conclude that (circle one) 

a) the wrong kind of average had been computed on the test. 

b) her class was below average ability. 

c) these results were what one might usually expect. 

18. Many parents and teachers believe that all beginning 4th graders 
should have a grade equivalent score of 4.0 or better. Such a 
belief is absurd because (circle one) 

a) not all children are alike - individual differences are important. 

b) parents confuse the difference between mean and median scores. 

c) by definition, approximately half the children will score below 
the mean. 

d) class norms do not fully portray the range of abilities in a 
classroom of students. 

19. Other things being equal, the usefulness of a test increases with 
(circle one) 

a) an increase in the number of norms repor ;ed from different 
standardization samples . 

b) an increase in the mean score. 

c) a decrease in the number of cases reported for each standardization 
sample. 
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Questions 20 through 23 are based on 
the Midville Deily News: 



the following paragraph taken from 



"No wonder we are behind in the space race'. The scores obtained by 
our own junior class at Midville High last Thursday are shocking. 
Fifty-five out of sixty juniors who took the Modern Techno OSY 
Test" scored lower than five percent. This shows conclusively the 
hieh school students all over the country are not being trained in 
science. The "Modern Technology Test" was developed and standardized 
on MIT graduates for selection in their new space-training progr • 

S we ere to get ahead in the space race, high school students must 

do better than this'." 



20. In the above paragraph, 

a) whet was the sample? ■■ — 

b) what was the population? — ■ 

c) what was the standardization sample reported for the "Modern 
Technology Test"? 




21. The above report is not fair because it did not refer to (circle one) 



a) 

b) 

c) 

d) 



more standardization samples of graduate students other than 



MIT students. 

separate norms for men and women. 

norms for high school students. 

the mean score for MIT students so that grade 

eould be assigned to the junior class. 



equivalent scores 



22. If one wished to estimate how much science training high school students 
in the country are receiving, one would (circle one) 



a) 

b) 

c) 

d) 



onsult with prominent scientists. ^ 

o a more careful study of the junior class at Midville. 
ample numbers of high school classes through the nation, 
nsist that every high school student in the country be tested. 



9* ComDering the scores of high school juniors at Midville with the 

23 ’ swes Sf graduate students at MIT is not helpful because (circle one) 

a) most of the Juniors do not wish to go to MIT when they graduate. 

b) the science equipment is vastly different at both schools. 

c) grade equivalent scores are not used beyond elementary school. 

d) high school students are being compared with a group to which 
they do not belong. 
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